Chris Bail
Computational Sociology
Duke University
Last class we explored the exciting new field of machine learning and asked how it might be applied to sociology. I argued that Generalized Additive Models, Regression Trees, and Random Forests can allow you to better understand causal complexity and non-linearity in your data, even if we are unlikely to see papers that present results using these models in ASR or AJS in the near future.
One of the reasons why I argued that machine learning has not caught on within sociology as much as other disciplines is that it is fundamentally about making predictions. Agent-based models, by contrast, assume that prediction is inherently impossible because most social systems are complex and adapative or “stochastic.”“ Instead of predicting an outcome, we therefore use ABMs to simulate many different possible outcomes, and use these simulations to both A) develop better theories; and B) create better ways of testing them.
Agent-based models have a long history in sociology (going back to Thomas Schelling's model in the Journal of Mathematical Sociology). Since then, the person who has pushed ABM forward more within sociology than anyone else is Michael Macy. Many of his students remain top leaders in this field as well: e.g. James Kitts, Damon Centola, Arnout Van de Rijt, etc.
1) An Introduction to Complexity Theory
2) An Introduction to Agent-Based Models
3) Coding Agent-Based Models in Net Logo
Basic characteristics of complex adaptive systems:
1) Made up of a network of interacting agents
2) Agents are always acting and reacting to the behavior of others, as well as the entire system.
3) Individuals can change the environment, which in turn affects the behavior of other individuals.
Situations where micro-level patterns of interaction generate macro-level social structures that cannot be explained by individual behavior alone.
Examples abound: segregation, protests, improvisation, diffusion of innovations, political polarization, scientific revolutions, etc.
Much of science still rests on mechanistic or reductionist approaches in which scholars attempt to derive universal laws.
Mechanistic approaches use very elegant mathematical models, but these models are generally unable to account for complex interactions and non-linear interactions between units of analysis.
What is more interactions can change the agents themselves- in unpredictable ways- making mechanistic approaches unable to account for the full range of different outcomes that might occur, or making highly inaccurate predictions because shifting relationships between actors and their environment can transform the behavior of the actors themselves.
Complex adaptive systems, by contrast, assume such deduction or prediction is impossible, or that most systems are highly “stochastic.”
Note that this is not just a fringe idea: this is a central tenet of quantum physics and evolutionary biology.
Why hasn't social science caught on?
One can argue that human behavior is even more unpredictable than that of animals!
ABM enables both inductive and deductive approaches, and is perhaps most fruitful when the two approaches are combined. Michael Macy has called this “theory mining.”
ABMs are particularly useful for theorizing processes that we cannot observe. For example, many of the issues we often talk about surrounding culture and cognition cannot be easily measured in surveys or by observing behavior. Whereas a survey might use repeated observation to try to develop a proxy measure for what an individual is thinking, an ABM would explicitly describe a decision making process (or interpretation process), and examine different outcomes when individuals are faced with different circumstances/choices.
An agent-based model is a computer program where individual agents or actors follow rules. These rules range from very simple (e.g. people will always maximize their self interest) to very complex (people only maximize their self interest if they think other people are doing the same thing).
Models are a simplification of reality. The question is how wrong are they? And what can we learn from the discrepancy between observed and simulated data?
There are many different ways to simplify reality.
In order to simplify reality, we need to make assumptions. These are usually too simple, but we can change the assumptions and see how they affect the model- once again, this gives us important information about how fragile the assumptions are.
On the other hand, making too many assumptions can be problematic, some assumptions may be more important than others.
Once again, however, we can build Agent-Based models in an iterative fashion, constantly tweaking our assumptions, or giving them different amounts of influence within the model.
Let's take a few minutes to examine some examples of agent based models using the NetLogo software.
Please go here to download NetLogo:
https://ccl.northwestern.edu/netlogo/
If you are on a Mac and unable to open the program, use this patch:
Developed by the economist Thomas Schelling in order to explain the persistance of racial segregation in the US
Explains why racial segregation can occur even if only a small number of people in a society have a preference for living near people of their own race/ethnicity.
Press the “setup” button in order to initialize the simulation
Press the “go once” button in order to simulate one step through time, or the
“go” button to simulate an entire time period.
Use the “% similar wanted” slider in order to explore different assumptions about how often people self segregate
The output appears here in the lower left corner of the screen, and you can export it by “right clicking” on these graphs:
Please load the AIDS model from the Models Library
There are many more models within the Modelling Commons:
http://modelingcommons.org/account/login
I created a folder in our dropbox called “Agent-Based” Modelling Resources with netlogo models for many of the most recent papers in AJS and ASR that use ABM (Most of these are authored by James Kitts and his students- thanks James!)
I also searched through the Modelling Commons in order to pick out some interesting social science models and put them in a folder entitled: “Random Social Science Agent Based Models that are not in the Models Library.”
Most papers that use ABM do not describe how they implement the code. This is unfortunate because it makes it seem like very high-level computing knowledge is necessary to perform ABM. This is emphatically NOT the case.
1) In NetLogo
2) Within R
3) New package RNetLogo enables you to load NetLogo from within R and export the output
directly. This is advantageous if you want to do a large number of simulations that differ in
slight ways and assess how sensitivity the model is to different assumptions/paramaters. However,
some of you may prefer to just keep everything in R- it depends how important it is for you
to interact with your visualization, and view it changing over time (r can graph these things,
but it's animation capabilities are much tricker to use than those in NetLogo)
Unfortunately there are currently some issues getting this running on Yosemite. See fix here:
http://stackoverflow.com/questions/26618105/rnetlogo-not-working-on-mac-yosemite
The Modeling process usually does not begin with a research typing code into a computer program.
Before you start coding you will have to answer a series of extremely important questions:
1) What is your research question?
2) Who are the agents in your model?
3) What type of rules might they follow?
4) What are your hypotheses?
5) Do you need to produce visual output in one “run,” or are you interested in using simualations to assess error across hundreds or thousands of runs?
NetLogo is an object-oriented language, just like R. As a result, one must first define a set of objects that work together in order to form an agent-based model: for example, a list of the set of the actors and their attributes, rules for decision making that they might follow, and “global” variables that help you identify how such micro-level behavioral rules bubble up into the macro-level patterns you might care about (such as the overall segregation rate in the Schelling Model.)
NetLogo was originally designed to analyze patterns of emergence within ecological systems.
Instead of using the term “agents,” the author of NetLogo opted for the term “turtles.”
IN NetLogo, you use the “;;” sign to indicate a comment, and comments usually occur on the same line as the code
some code here ;; this is a comment
1) Define Global Variables;
2) Define Attributes of Agents (Turtles);
2) Define Society or Social Space where Agents Interact;
3) Define Interaction Rules for Agents;
4) Add Buttons or Sliders to Visualization Window to faciliate experimenting with different paramaters/assumptions
These are the macro-level variables we want to observe from our model and/or output to analyze or visualize in R. For the Schelling Model, we have two global variables: percent similar people within the population, and percent unhappy with their current residential location vis-a-vis members of the opposite group:
globals [
percent-similar ;; on the average, what percent of a turtle's neighbors
;; are the same color as that turtle?
percent-unhappy ;; what percent of the turtles are unhappy?
]
To describe attributes of the turtles, we use the turtles-own[] operator. For each turtle
we are going to need to know the following information:
turtles-own [
happy? ;; for each turtle, indicates whether at least %-similar-wanted percent of
;; that turtles' neighbors are the same color as the turtle
similar-nearby ;; how many neighboring patches have a turtle with my color?
other-nearby ;; how many have a turtle of another color?
total-nearby ;; sum of previous two variables
]
The to setup function describes how you want the society or system to be created.
to setup
clear-all
;; create turtles on random patches.
ask patches [
if random 100 < density [ ;; set the occupancy density
sprout 1 [
set color one-of [red green]
]
]
]
update-variables
reset-ticks
end
Note the ask patches[] function, which specifies that we want to create the world as a set of patches where actors can move (but we could have defined the world as a network of nodes, among other options.
ask patches [
if random 100 < density [ ;; set the occupancy density
sprout 1 [
set color one-of [red green]
]
]
Density is a variable we've created that describes whether or not all of the patches are occupied by a turtle (or agent)
The sprout command is randomly determining the color of each patch as either red or green using a random number generator between 1 and 100 that is less than the value of “density” which we are going to define later
ask patches [
if random 100 < density [ ;; set the occupancy density
sprout 1 [
set color one-of [red green]
]
]
Density is a variable we've created that describes whether or not all of the patches are occupied by a turtle (or agent)
So far we've established
1) What we want to learn from the model (globals)
2) Characteristics of the agents in the model (turtles-own)
3) The boundaries of the society (patches/sprout)
Now we need to define the rules that govern the behavior of the agents (turtles) as they chose where to live.
In order to do this, we are going to have to wrap some functions
within functions, so bear with me… First, we need to learn about the
go command which corresponds to the “Go” or “Go Once” command on the
Graphical User Interface we played with earlier
to go
if all? turtles [ happy? ] [ stop ]
move-unhappy-turtles
update-variables
tick
end
Remember that we defined “happy” as an attribute of all turtles, so the first line is just saying, “if all the turtles are happy, end the simuatlion.
if all? turtles [ happy? ] [ stop ]
The rest of this section calls other functions established later in the code called
move-unhappy-turtles and update-variables a tick refers to one unit of time-
perhaps a second.
move-unhappy-turtles
update-variables
tick
end
The move-unhappy-turtles function is going to create a lot of action in this simulation
because most of the time there are at least some people/agents who are not happy with where they
live because they live people who are from the other group.
to move-unhappy-turtles
ask turtles with [ not happy? ]
[ find-new-spot ]
end
Note the to here defines this as a separate function….find-new-spotis yet another
function that is created further down the page of code.
Next, we randomly move the turtles to patches in the “society” until we find one that is not occupied.
to find-new-spot
rt random-float 360
fd random-float 10
if any? other turtles-here [ find-new-spot ] ;; keep going until we find an unoccupied patch
move-to patch-here ;; move to center of patch
end
the random-float function is creating a number between zero and the number that follows it (in this case 0-360 and 0-10)
rt means move the turtle to the right a random number of patches, and fd means move the turtle forward a random number of patches (remember that the patches are defined by cardinal directions using
the patches set up).
rt random-float 360
fd random-float 10
Next, we need to periodically change the characteristics of our turtles/agents (whether they are happy with their location or not), as well as our global variables (e.g. the number of turtles/agents who are happy)
to update-variables
update-turtles
update-globals
end
Once again, we are calling functions that are further down the page of code called update-turtles and update-globals
to update-turtles
ask turtles [
set similar-nearby count (turtles-on neighbors) with [ color = [ color ] of myself ]
set other-nearby count (turtles-on neighbors) with [ color != [ color ] of myself ]
set total-nearby similar-nearby + other-nearby
set happy? similar-nearby >= (%-similar-wanted * total-nearby / 100)
if visualization = "old" [ set shape "default" ]
if visualization = "square-x" [
ifelse happy? [ set shape "square" ] [ set shape "square-x" ]
]
]
end
Here we are simply testing whether the turtles nearby each turtle are the same color, and whether the proportion of similar turtles corresponds to the preferences we will set up in a later stage (a variable called %similar-wanted)
set similar-nearby count (turtles-on neighbors) with [ color = [ color ] of myself ]
set other-nearby count (turtles-on neighbors) with [ color != [ color ] of myself ]
set total-nearby similar-nearby + other-nearby
set happy? similar-nearby >= (%-similar-wanted * total-nearby / 100)
Note that the code is able to check the color of nearby turtles because we defined color as an attribute of turtles at an earlier stage
The code is also changing the shapes used to describe the turtles state of happiness (we are using an “x” to describe turtles that are unhappy with this code):
if visualization = "old" [ set shape "default" ]
if visualization = "square-x" [
ifelse happy? [ set shape "square" ] [ set shape "square-x" ]
The first line is allowing the user to decide whether or not she or he wants to turn on this visualization function or not.
Finally, we update the global variables so that we can generate our output (the graphs in the lower left hand side of the GUI)
to update-globals
let similar-neighbors sum [ similar-nearby ] of turtles
let total-neighbors sum [ total-nearby ] of turtles
set percent-similar (similar-neighbors / total-neighbors) * 100
set percent-unhappy (count turtles with [ not happy? ]) / (count turtles) * 100
end
 
Remember how we did not define the density variable or the %similar-wanted variable? We can do that by adding “sliders” within the GUI's “Interface” pain (the main visualization window)
1) Select the type of interactive feature you want (slider, button, plot etc)
2) Click the “add” button
3) Place the cursor where you want to put the button/slider and click the mouse
Agents can either become infected with the virus or resistant to the virus, and they get a virus check within a regular time period defined by the user.
turtles-own
[
infected? ;; if true, the turtle is infectious
resistant? ;; if true, the turtle can't be infected
virus-check-timer ;; number of ticks since this turtle's last virus-check
]
Once again, in order to set-up this “world”, which is a network, we are going to call some functions that are defined later on in the code.
to setup
clear-all
setup-nodes
setup-spatially-clustered-network
ask n-of initial-outbreak-size turtles
[ become-infected ]
ask links [ set color white ]
reset-ticks
end
to setup-nodes
set-default-shape turtles "circle"
crt number-of-nodes
[
; for visual reasons, we don't put any nodes *too* close to the edges
setxy (random-xcor * 0.95) (random-ycor * 0.95)
become-susceptible
set virus-check-timer random virus-check-frequency
]
end
to setup-spatially-clustered-network
let num-links (average-node-degree * number-of-nodes) / 2
while [count links < num-links ]
[
ask one-of turtles
[
let choice (min-one-of (other turtles with [not link-neighbor? myself])
[distance myself])
if choice != nobody [ create-link-with choice ]
]
]
; make the network look a little prettier
repeat 10
[
layout-spring turtles links 0.3 (world-width / (sqrt number-of-nodes)) 1
]
end
to go
if all? turtles [not infected?]
[ stop ]
ask turtles
[
set virus-check-timer virus-check-timer + 1
if virus-check-timer >= virus-check-frequency
[ set virus-check-timer 0 ]
]
spread-virus
do-virus-checks
tick
end
to become-infected ;; turtle procedure
set infected? true
set resistant? false
set color red
end
to become-susceptible ;; turtle procedure
set infected? false
set resistant? false
set color green
end
to become-resistant ;; turtle procedure
set infected? false
set resistant? true
set color gray
ask my-links [ set color gray - 2 ]
end
While this line is short, this is a key stage in the code:
to spread-virus
ask turtles with [infected?]
[ ask link-neighbors with [not resistant?]
[ if random-float 100 < virus-spread-chance
[ become-infected ] ] ]
end
to do-virus-checks
ask turtles with [infected? and virus-check-timer = 0]
[
if random 100 < recovery-chance
[
ifelse random 100 < gain-resistance-chance
[ become-resistant ]
[ become-susceptible ]
]
]
end
We need the following sliders:
1) number-of-nodes;
2) average-node-degree;
3) initial-outbreak-size;
4) virus-spread-chance;
5) recovery-chance;
6) gain-resistance-chance.
We need the following buttons:
1) Setup;
2) Go.
Even though it would take a lot of work to write this from scratch, you probably won't have to given how many “sample” models are already out there!
To tweak other people's code, you will first need to understand each step in their code. This site lists all the different functions/commands available within NetLogo:
The downside of NetLogo is that you have to learn new commands and new syntax. The up-side is that the brilliant author of this program has saved you from doing much more complicated coding in Java! be grateful :)
Advantages:
1) No need to move back and forth between NetLogo and R;
2) Faster computation if code is moved to Amazon AWS;
3) Ability to integrate stats functions from R into your
simulations.
Disadvantages:
1) Potentially harder to code depending upon how intuitive NetLogo is to you.
From James Kitts:
http://socdynamics.org/id4.html
Lots of great stuff by Carter Butts for you network folks! I also added some of my own code into the “Agent-Based Modeling Resources ” folder in the Dropbox.
Social media sites and other internet sites not only provide us with data for descriptive studies, but also the opportunity to conduct carefully controlled studies that help us pin down causal relationships that have long eschewed measurement in sociology. But experiments also raise a number of thorny ethical questions that we will discuss after a brief lecture on the nuts and bolts of setting up an experiment.